Goto

Collaborating Authors

 linearly separable data


Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

arXiv.org Machine Learning

Simplicity bias, the propensity of deep models to over-rely on simple features, has been identified as a potential reason for limited out-of-distribution generalization of neural networks (Shah et al., 2020). Despite the important implications, this phenomenon has been theoretically confirmed and characterized only under strong dataset assumptions, such as linear separability (Lyu et al., 2021). In this work, we characterize simplicity bias for general datasets in the context of two-layer neural networks initialized with small weights and trained with gradient flow. Specifically, we prove that in the early training phases, network features cluster around a few directions that do not depend on the size of the hidden layer. Furthermore, for datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages. These results indicate that features learned in the middle stages of training may be more useful for OOD transfer. We support this hypothesis with experiments on image data.


SVMs for Linearly Separable Data with Python

#artificialintelligence

In our last few articles, we have talked about Support Vector Machines. We have considered them with hard and soft margins, and also how we can use the Kernel Trick when our data is not linearly separable. However, in this article, we will only consider how to implement an SVM when our data is linearly separable. In the next article, we will move on to consider how to implement it when the data is no longer linearly separable. We will implement our models using Jupyter Notebook and various libraries.


Intuitively, How Do Neural Networks Work?

#artificialintelligence

In my previous article about Intuitively, how can we understand different classification algorithms, I introduced the main principles of classification algorithms. However, the toy data I used was quite simple, almost linearly separable data; in real life, the data is almost always non-linear, so we should make our algorithm able to tackle non linearly separable data. Let's compare how logistic regression behaves with almost linearly separable data and non-linearly separable data. With the two toy data below, we can see that Logistic Regression helps us find the decision boundary when the data is almost linearly separable, but when the data is not linearly separable data, Logistic Regression is not capable to find a clear decision boundary. It is understandable because Logistic Regression is only able to separate the data into two parts.


Support Vector Machine (SVM) Tutorial: Learning SVMs From Examples

@machinelearnbot

After the Statsbot team published the post about time series anomaly detection, many readers asked us to tell them about the Support Vector Machines approach. It's time to catch up and introduce you to SVM without hard math and share useful libraries and resources to get you started. If you have used machine learning to perform classification, you might have heard about Support Vector Machines (SVM). Introduced a little more than 50 years ago, they have evolved over time and have also been adapted to various other problems like regression, outlier analysis, and ranking. SVMs are a favorite tool in the arsenal of many machine learning practitioners.


An Introduction to Support Vector Machines - DZone AI

#artificialintelligence

If you have used machine learning to perform classification, you might have heard about support vector machines (SVM). Introduced a little more than 50 years ago, they have evolved over time and have also been adapted to various other problems like regression, outlier analysis, and ranking. SVMs are a favorite tool in the arsenal of many machine learning practitioners. In this post, we will try to gain a high-level understanding of how SVMs work. I'll focus on developing intuition rather than rigor. What that essentially means is we will skip as much of the math as possible and develop a strong intuition of the working principle. Say there is a machine learning (ML) course offered at your university.